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Abstract 

Lifetime  data  classified  according  to  categorical  variables  under  the  proportionality 
of  the  hazard  functions  of  response  variables  for  various  treatment  combinations  is 
assumed.  The  proposed  model  is  a  combination  of  Cox's  proportional  hazards 
model  and  ANOVA  model.  The  existence  of  a  solution  to  the  marginal  likelihood 
function  is  examined  for  the  case  of  2x2  two-way  classification.  We  provide  an 
easily  verifiable  condition  for  the  existence  of  a  unique  estimate. 

Key  Words:  Hazard  function,  Cox's  Proportional  Hazards  model,  Marginal 
likelihood.  Convex  hull. 


1  Introduction 

The  Cox  Proportional  Hazards  model'  has  enjoyed  enormous  popularity  among 
statisticians  for  assessing  the  influence  of  various  factors  on  survival  time.  The 
Science  Citation  Index  indicates  that  by  the  end  of  1993  there  were  over  7000 
references  to  that  article  which  makes  it  one  of  the  most  frequently  cited  articles, 
Henderson.^  Let  us  look  at  the  basic  idea  behind  this  model. 

Let  ?:(/  =  1, be  independent  continuously  distributed  random  variables 
representing  the  times  of  death  of  n  individuals  and  suppose  there  exists  a  censoring 
time  C/  associated  with  each  individual.  Under  Cox's  PH  model,  the  /th  individual 
hazard  rate 

A,,(/)  =  Hm  Pr[7;  </  +  5|  7;  >/] 

is  of  the  special  form 


^/(0  =  >^o(0  exp(P'x,.), 

where  x/  is  a  column  vector  of  p  covariates,  p'  is  the  transpose  vector  of  their 
corresponding  unknovm  coefficients,  and  finally  Xo(/)  represents  a  fixed  unknown 
baseline  hazard  rate  for  individuals  with  x.  The  observed  data  for  the  /th  individual, 
consist  of  min(7;  ,C,  ),  5,-  =I(7;.  <C,)  and  x/  where  !(•)  is  the  indicator  function. 
Furthermore,  note  that  the  probability  density  function  (pdf)  of  7}  is  uniquely 
determined  by  its  hazard  function  as 


y;(/;p)  =  Xo(0  exp(P'x;)  exp 


-exp(p'x,)  \X^{u)du 


{0,1] 


/>0, 


The  objective  is  to  make  inferences  on  the  parameters  of  interest,  namely  P’s.  If 
afunctional  form  is  assumed  for  one  can  rely  on  the  maximum  likelihood 

method  of  estimation  and  the  corresponding  asymptotic  results  to  draw  appropriate 
inferences.  However,  the  maximum  likelihood  estimation  of  the  parameters  is 
fraught  with  many  difficulties.  Alternatively,  if  no  functional  form  is  assumed  for 
one  needs  to  rely  on  non-parametric  techniques.  Consequently,  we  require 
an  estimating  function  that  is  purely  a  function  of  p’s.  Thus,  Cox'  observed  that 
distribution  of  the  relative  positions,  i.e.  ranks,  of  the  observations  is  constant  in  time 
and  furthermore  is  entirely  a  function  of  covariates.  This  is  the  nub  of  the  marginal 
likelihood  principle. 

There  are  two  ways  of  recording  the  relative  positions  in  a  given  set  of  data.  One 
way  is  to  consider  the  rank  of  Tj  which  is  /?,  =  #{l  <  y  <  «  :  7).  <  ?;}  and  under  the 

absolutely  continuous  assumption  of  Tj’s  the  ranks  are  a  permutation  of  1,2, •••,n 
almost  surely.  Another  way  of  identifying  the  location  of  the  data  is  in  terms  of  the 
label  associated  with  /th  order  statistics,  denoted  by  5,  .  Note  that  the  rank  vector, 

R  =  (/?i,7?2»  '  »^m)  j  and  the  label  vector,  S  =  (5'i,iS'2,  -*,iS„),  determine  each  other 
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uniquely.  As  a  matter  of  fact,  the  permutation  R  is  the  inverse  permutation  of  S  if  we 
view  R  and  S  as  maps  from  (1,2, •••,«} .  More  precisely, 

We  now  focus  on  the  distribution  of  R  or,  equivalently  that  of  S.  The  number 
of  possible  values  of  R  is  n\.  Let  r  =  (rj,/2,-",r„)  be  a  permutation  of  l,2,-”,n 
and  s  =  (5i,5’2,---,5„)  be  the  inverse  of  r.  Let  /)(•)  be  the  pdf  of  7],  /  =  l,2,---rt. 
Then, 

P(R  =  r)  =  P{S  =  s) 

=  1-1  (I) 

0<rj<-<f^<oo 

e’<p(x;,|i)  »  exp(i;,p) 

This  probability  also  can  be  written  in  terms  of  the  so-called  risk  sets 
9l(/)  =  which  represents  the  set  of  individuals  that  are  alive  just  before 

the  ith  death.  The  above  probability  is  referred  to  as  the  marginal  likelihood  Lmi-)- 
This  approach  was  originally  proposed  by  Cox'  and  further  clarified  by  Cox,’  and 
Kalbfleisch  and  Prentice,*  and  also  discussed  by  Miller,^  Lawless’  and  Cox  and 
Oakes.’  Realistically  some  observations  could  be  censored,  hence  Equation  (l)  is 
modified  by  considering  taking  the  product  over  the  non-censored  observations,  i.e., 
replacing  n  by  total  number  of  failures,  k. 
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It  should  be  noted  that  the  Equation  (1)  can  also  be  obtained  through  partial 
likelihood  argument,  see  Andersen'*^.  The  maximum  marginal  likelihood  estimators, 

P ,  are  the  solution  to  the  score  equations  generated  by  V  log  (P)  =  0 .  The 
estimators  have  been  shown  to  be  asymptotically  normal  with  mean  Pq  and 

covariance  matrix  of  E  =  i;'  where  I,,  = -V- V'logZ„(P)  is  the  observed 
information  matrix  and  V  refers  to  the  gradient  vector  operator. 

In  this  paper,  we  explore  the  above  model  in  the  context  of  a  2x2  classification 
which  is  one  the  most  commonly  used  design  scheme  in  cohort  studies.  Arani  and 
Rao^  provided  a  necessary  and  sufficient  condition  for  the  existence  of  a  unique 
solution  to  likelihood  equations  in  the  case  of  2x2  classification  design  when  the 
baseline  hazard  function  corresponds  to  the  exponential  distribution.  These 
conditions  were  derived  by  utilizing  the  results  provided  by  Makelainen  et  al*  In  the 
next  section,  we  provide  a  necessary  and  sufficient  condition  for  existence  of  a 
unique  maximum  marginal  (or  partial)  likelihood  estimates,  first  for  the  case  of  single 
observation  in  each  cell,  followed  by  extension  to  multiple  observations.  Finally  in 
the  last  section,  the  implementation  of  the  results  are  illustrated  through  an  example. 


2  Existence  of  a  Unique  Solution 

There  is  no  guarantee  that  the  maximum  likelihood  estimate  based  on  the  marginal 
likelihood  of  the  ranks  exists  and  is  unique.  Thus,  before  exploring  the  asymptotic 
properties  of  the  estimator  the  question  of  existence  and  uniqueness  must  be  resolved. 
Note  that  Equation  (l)  can  be  rewritten  as'® 
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k 

n 

/=! 


1  +  2  exp((x} 


(2) 


Jacobsen”  '^  showed  L„{^)  is  strictly  concave  if  and  only  if  the  contrast  covariate 
vectors, 

Xy-Xj,  ,  for  every  /  =  1,2, •••,/:  and  y  eiR(/)- {/},  (3) 

span  the  parameter  space  0.  Additionally,  he  showed  that  p  exists  and  is  unique  if 
and  only  if  0  belongs  to  the  interior  of  the  convex  hull  of  the  contrast  covariate 
vectors,  Xy  -  Xj^  ,  /  =  1,2,-”,^,  and  y  e9l(/)  -  |/J,  (Note  that  the  bold  characters 
refer  to  vectors  or  matrices.) 


First,  let  us  consider  the  following  2x2  design  with  single  observation  in  each 
cell. 


Drug  B 


Drug  A 


Dose  Level 

th 

Tn 

Tx2 

Tix 

T22 

such  that  Ty  and  x,y  denote  the  lifetime  and  the  covariate  vector  associated  with  the 
individual  who  has  been  administered  the  /th  dose  level  of  Drug  A  and  the  yth  dose 
level  of  Drug  B.  The  hazard  function  of  Ty  is  proposed  to  be 

Py)’  ^  >  0;  /  =  1,2  and  y  =  1,2 
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subject  to  tti  +  a2  =  0  and  pj  +  p2  =  0 .  These  constraints  are  imposed  to  avoid  any 
nonidentifiability  problem.  Thus,  it  suffices  to  estimate  the  parameter  vector 
(ai,P|).  Under  the  above  design  the  covariate  vectors,  Xy,  are  identified  to  be 
x|,=(l,l),  x',2=(l-l),  x^i  =(-1,1)  and  x^2=(-l»-0-  simplicity,  let 

(a|,P|)  =  (ot,P)  =  P'  and  label  the  indices  (1,1) ,  (1,2) ,  (2,1)  and  (2,2)  by  1,2,3, 
and  4,  respectively. 

Let  us  consider  the  ideal  case  of  no  censoring  with  no  ties.  Let  R\  be  the  rank  of 
7]  I ,  i?2  the  rank  of  7|2 ,  the  rank  of  T^i ,  and  /?4  the  rank  of  T22  •  Note  that  the 
rank  vector  R  =(i?|,/?2>^3»^4)  can  assume  4!  =  24  possible  values.  For  the 
purpose  of  illustration,  let  us  examine  the  necessary  and  sufficient  conditions  for  the 
existence  of  a  unique  estimate  for  the  special  case  of  R  =  (1,3,4 ,2),  i.e.,  S  =  (l,4,2,3) . 

Thus,  the  risk  sets  are  iR(l)  =  {l,4,2,3} ,  91(2)  =  {4,2,3} ,  91(3)  =  {2,3}  and 

91(4)  =  {3}.  The  contrast  vectors  are  given  by  xj  -  xj  =  (-2,-2) ,  Xj  -  xj  =  (0,-2) , 

X3  -  x;  =  (-2,0) ,  Xj  -  X4  =  (2,0) ,  X3  -  X4  =  (0,2)  and  Xj  -  Xj  =  (-2,2) .  Note  that 

clearly  the  contrast  vectors  span  the  parameter  space  and  further  more  the  convex  hull 
generated  by  them  contains  zero  which  are  indicative  of  a  unique  solution  to 
likelihood  equations.  Similarly  for  each  case  of  R,  one  can  examine  the  necessary 

and  sufficient  conditions  for  the  existence  of  a  unique  estimate  of  (a,  p)  . 

It  is  instructive  to  examine  when  we  have  solutions  for  certain  trails  of  deaths,  but 
not  for  others.  The  order  in  which  the  individuals  die  is  the  key  to  an  understanding 
of  this  phenomenon.  Following  an  exhaustive  search,  it  can  be  established  that  the 
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only  time  the  optimal  estimates  exist  is  when  the  trail  of  deaths  always  moves 
crosswise  as  shown  in  Figure  1 . 


Scenario  I  Scenario  2 


)4 

X 

Scenario  5 

Scenario  6 

X 

X 

Scenario  3  Scenario  4 


X 

X 

Scenario  7 

Scenario  8 

K 

X 

Figure  1.  Trails  of  deaths  under  which  estimator  exist. 


In  all  other  scenarios,  the  trail  of  deaths  moves  either  horizontally  or  vertically  (i.e.,  a 
zigzag  pattern  such  as,  the  one  corresponding  to  R  =  (2,4,1,3)  ),  which  is  not 
conducive  to  a  unique  solution  of  the  marginal  likelihood  equations.  Using  Equation 
1 ,  probability  of  occurrence  of  each  admissible  scenario  can  be  obtained,  but  for  the 
sake  of  efficiency  the  algebraic  detail  is  deferred  to  the  appendix.  Thus,  It  readily 
can  be  shown  that  a  unique  solution  to  likelihood  equations  exists  with  a  positive 
probability  of 
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^0  =  X  scenario)  = 

m=\ 


Iv 

j 


W:  -  W 


(4) 


where  i,J  - 1,-  •  •  ,4  and  w^  =  exp(a  +  P),  W2  =  exp(a  -  P),  ^3  =  exp(-a  +  p),  and 
W4  =  exp(-a  -  P) . 
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Up  to  this  point,  we  assumed  that  all  the  failure  times  are  observed  (i.e.,  censoring 
is  not  present)  which  might  not  be  feasible.  In  the  presence  of  censoring  the  same 
line  of  reasoning  can  be  followed  to  establish  easily  the  scenarios  under  which  a 
unique  solution  exists.  Let  A,y  =  <  Cy)  be  a  Bernoulli  random  variable  (as 

defined  in  Section  1)  representing  the  censoring  status  of  the  individual  who  has 

received  /th  and  yth  dose  level  of  Drug  A  and  Drug  B,  respectively.  All  admissible 

scenarios  for  each  possible  value  of  A..  =  =  04»2,3,4  are  listed  in  Table  1, 

i  j 


and  furthermore  it  follows  that 


/’(A..  =2)  = 


W/ 


=  2Pn 


and 


?(A..  =  3)  =  3Po. 


Furthermore,  the  above  results  are  extended  to  the  case  of  ny  individuals  in  ijth 
cell  in  the  following  theorem.  Letting  '^ijk  be  the  lifetime  of  the  Ath  individual  who 
has  been  administered  /th  dose  level  of  Drug  A  and  yth  dose  level  of  Drug  B,  where 
i  =  1,2,  j  =  1,2  and  k  =  \,2,---,ny  and  we  have. 

Theorem  Let  Tyj^ ,  where  /  =  1,2  ,  j  =  1,2  and  k  -  \,2,---,ny,  be  independent 
positive  random  variables,  and  ^1)  <  ^2)  <■■■<  )  be  the  corresponding  order 

statistics.  There  exists  a  unique  solution  for  the  marginal  likelihood  equations,  if 
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there  exists  i'l  <  ^2  <  -^3  <  ^4  such  that  ^(^l)  ^(S2)  <  ^(^3)  ^  ^(S4)  follows  one  of  the 

schemes  in  Table  1 . 

Proq/i  Given  ^(il)  <  ^(i2)  ^  ^(Ss)  ^(S4)  follows  one  of  the  admissible  scenarios, 

without  loss  of  generality  first  scenario  is  assumed,  thus  the  labels  of 
^(^2)’  ^(^3)  and  are  to  be  llAj,  22^2 >  12^:3  and  21^:4  respectively  for 

some  1  <  <  n, I  ,  1<^2  -'^22-  1  ^  ^3  ^  «12 »  and  1  <  ^4  <  /J21 .  The  risk  sets  91(/|) , 
^('2)  ^Os)  and  9^(14)  associated  with  the  entire  data  and  the  risk  sets  91(1),  91(2) 
91(3)  and  91(4)  associated  with  the  particular  segment 


A'-.B 

«l 

T^\k^ 

^2*3 

02 

^1*4 

hlk-. 

of  the  data  have  the  following  relation  91(r)  c  91(5^)  for  r  =  1,2,3,4 .  Consequently, 
the  contrast  covariate  vectors  associated  with  the  entire  data  set  will  contain  the 
contrast  covariate  vectors  associated  with  the  above  data  segment.  Afortiori,  the 
marginal  likelihood  equations  associated  with  the  entire  data  are  uniquely  solvable  in 
p.  □ 

Further  more  defining, 

A  =  |no  admissable  trail  in  any  4  adjacent  positions} 

5  =  (no  admissable  trail  in  any  subsequence}. 

It  is  followed  that  Bq  A  and  the  following  inequality  follows  immediately 

P(B)  <  P(A)  =  (1  -  . 
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Hence,  a  unique  solution  to  the  marginal  likelihood  equations  exists  with  probability 
converging  to  1  exponentially  as  min  n,-,-  oo . 

ij 

It  should  be  noted  that  the  above  results  were  presented  in  the  case  with  no 
censoring  to  preserve  continuity  in  the  text.  Moreover,  the  results  can  be  extended  to 
case  with  censoring  by  replacing  with  Y^j|^  =  min(7)yjt ,  )  • 

3  Discussion 

As  noted  earlier,  the  Cox’s  regression  model  is  used  frequently  to  analyze  survival 
data  augmented  by  some  additional  information.  Naturally,  this  model  has  been 
incorporated  in  many  software  packages  such  as  PHREG  procedure  in  SAS.  One 
practical  advantage  of  the  obtained  results  is  that  they  can  be  utilized  as  a  diagnostic 
tool  to  identify  any  unrealistic  and  misleading  results.  In  situations  with  no 
admissible  scenario,  one  can  conclude  inappropriateness  of  the  proposed  additive 
model  or  a  need  for  additional  observation  to  be  taken.  Clearly,  in  the  case  of 
historical  data,  second  suggestion  is  not  viable.  Thus,  alternative  models  need  to  be 
explored. 

For  the  sake  of  illustration,  we  consider  the  data  obtained  by  Edmunson  et  al}^ 
for  which  the  objective  was  to  study  the  effect  of  two  chemotherapy  treatments, 
namely  cyclophasphamide  alone  or  its  combination  with  adriamycin,  after  surgical 
treatment  of  ovarian  cancer.  A  total  of  26  women  who  had  experienced  surgical 
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excision  of  all  tumor  were  considered.  The  patients  were  also  classified  according  to 
whether  residual  disease  was  completely  or  partially  excised.  Patients  were  randomly 
assigned  to  one  the  chemotherapy  treatments.  The  data  can  be  classified  according  to 
levels  of  excision  and  treatment  status  as  follow. 


cyclophasphamide 

cyclophasphamide  + 
adriamycin 

Partial 

638,1106+, 855+, 

1 227+,  1129+, 563, 744+, 

excision 

803+,448+ 

353,377+ 

Complete 

156,l040+,59,329,268, 

42 1+,769+, 365,770+, 475, 

excision 

431,115.477+ 

64,1206+ 

Figure  2.  Survival  times  after  randomization  to  treatment,  +  represents  Censored  observation. 


The  trail  of  deaths  according  to  their  cell  number  for  the  above  data  set  is 


►x— ►x— >o->o-^X— ►o— >X— >X— ^o->X  — ►••• 

2  2  2  2  2  J  4  3  4  2  I  4  4  2  3 

— ►  X  — >  o  — >  O  — ^  O  — >  O  — >  O  — >  O  — >  O  — >  O  — ►  o  — >  o 

I  3441  121343 

Clearly  one  can  find  a  subsequence  (indicated  by  »)  that  satisfies  at  least  one  of  the 
admissible  scenarios,  in  Figure  1  such  as  the  Scenario  1 .  This  result  in  some  sense 
(since  full  likelihood  is  not  used  additional  conditions  are  imposed)  supports  the 
results  obtain  by  Arani  and  Rao^  in  the  parametric  case  when  the  baseline  hazard  is 
assumed  to  be  constant.  That  is,  in  addition  to  requiring  the  at  least  two  observations 
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in  the  diagonal  cells,  certain  order  is  imposed  on  the  survival  times.  Thus,  it  is 
concluded  that  any  estimate  based  on  marginal  (partial)  likelihood  can  be  trusted. 
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Table  1.  Admissible  scenarios  for  which  unique  solution  exists.  Identify  the  subscripts  11  with  1, 
12  with  2,  21  with  3,  and  22  with  4,  and  deaths  are  marked  by  a  cross  x  and 
censoring  by  q  . 
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Appendix 


The  details  for  obtaining  Equation  4,  is  provided  as  follow.  For  the  sake  of 
simplicity,  let  us  denote  0  =  exp(a)  and  (p  =  exp(p).  Thus  using  Equation  1,  it 
follows. 


P(senario  1)  =  P(R  =  (1,3, 4, 2)) 


-  P(7’i  1  <  722  <  ^12  <  ^21 ) 


1 


0(0  +  —  +  —  +  — 
I  r\ 


(p  0  0(pA(p  0  0(pA(p  0 


0 


P(senario  2)  =  P(R  =  (1,4,3, 2)) 


-  P(7i  I  <  7^22  <  7*21  <  r|2  ) 


1 


A  0  9  1 

0(0  +  —  +  —  + - 

T  r\ 


^  (p  1 

(p  0  0(pA(p  0  d<pj 


.(p  QJ  (p 


P(senario  3)  =  P(R  =  (3,1,2,4)) 

=  P(7’i2  <  7’2i  <  T]!  <  r22) 

1 


0  0  9  1 

0(p  +  — +  — +  — , 

^  (p  0  Q(pJ 


'"o  9  lY 

0(p  +  — +  — 

0  0(pj 


1 


0(p  +  — , 

0(py  0(p 


1 


P(senario  4)  =  P(R  =  (4,1,2,3)) 

=  P(7’i2<7’21<7’22<7’„) 

1 


0  9  1 

0(p +  —  +  —  +  —  , 

<  (p  0  0(pj 


f0(p  +— +— 

I  0  0(pA 


1  A 

09  +  —  09 
09/ 
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P(senario  5)  =  P(R  =  (3, 2, 1,4)) 


=  p(rj,  <r,j<ri,<r22) 


1 


A  0  9  1 

0(0  +  —  +  —  +  — , 

cp  0  Q<pJ 


a  0  1 

0(0  +  —  + - 

^  (p  0(py 


1 


09+—, 

0(p/^  09 


1 


P(senario  6)  =  P(R  =  (4, 2, 1,3)) 


-  P(T2\  <  Ti2  <^22  <  7i  1 ) 

1 


„  0  9  1 

09  +  —  +  —  +  — 
9  0  09> 


D  0  1 

09  +  —  +  — 
9  09^ 


1  1 

09+  0(p 

09^ 


P(senario  7)  =  P(R  =  (2,3,4,1) 


=  P(7’22<7’i,<7’,2<r2,) 


1 


A  0  9  1 
09  +  —  +  —  +  — 

<  9  0  09A 


A  0  9 

09  +  —  +  — 

9  0> 


0  9 

^9  0> 


1 

0 


P(senario  8)  =  P(R  =  (2,3,4,1) 


-  P(722  <  rji  <T2i  <  7’|2) 


1 


Q  0  9  1 

09  +  —  +  —  +  — 

^  9  0  Q(pJ 


A  0  9 

09  +  —  +  — 


9  0A9  0 


0 
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Note  that  summing  up  the  above  probabilities  would  result  in  Equation  4.  Similarly, 
one  can  obtain  the  probability  of  admissible  scenarios  under  censoring. 
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